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Abstract — Polar codes provably achieve the capacity of a 
wide array of channels under successive decoding. This assumes 
infinite precision arithmetic. Given the successive nature of the 
decoding algorithm, one might worry about the sensitivity of the 
performance to the precision of the computation. 

We show that even very coarsely quantized decoding algo- 
rithms lead to excellent performance. More concretely, we show 
that under successive decoding with an alphabet of cardinality 
only three, the decoder still has a threshold and this threshold is 
a sizable fraction of capacity. More generally, we show that if we 
are willing to transmit at a rate S below capacity, then we need 
only clog(l/<5) bits of precision, where c is a universal constant. 

I. Introduction 

Since the invention of polar codes by Arikan, UJ, a large 
body of work has been done to investigate the pros and cons 
of polar codes in different practical scenarios (for a partial list 
see H-E)). 

We address one further aspect of polar codes using succes- 
sive decoding. We ask whether such a coding scheme is robust. 
More precisely, the standard analysis of polar codes under suc- 
cessive decoding assumes infinite precision arithmetic. Given 
the successive nature of the decoder, one might worry how 
well such a scheme performs under a finite precision decoder. 
A priori it is not clear whether such a coding scheme still 
shows any threshold behavior and, even if it does, how the 
threshold scales in the number of bits of the decoder. 

We show that in fact polar coding is extremely robust with 
respect to the quantization of the decoder. In Figure [TJ we 
show the achievable rate using a simple successive decoder 
with only three messages, called the decoder with erasures, 
when transmission takes place over several important channel 
families. As one can see from this figure, in particular for 
channels with high capacity, the fraction of the capacity that 
is achieved by this simple decoder is close to 1, i.e., even this 
extremely simple decoder almost achieves capacity. We further 
show that, more generally, if we want to achieve a rate which 
is 6 below capacity by S > 0, then we need at most clog(l/<5) 
bits of precision (all the logarithms in this paper are in base 
2). 

The significance of our observations goes beyond the pure 
computational complexity which is required. Typically, the 
main bottleneck in the implementation of large high speed 
coding systems is memory. Therefore, if one can find de- 
coders which work with only a few bits per message then 
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Fig. 1. The maximum achievable rate, call it C(W, Q), of a simple 
three message decoder, called the decoder with erasures, as a function 
of the capacity of the channel for different channel families. From top 
to bottom: the first curve corresponds to the family of binary erasure 
channels (BEC) where the decoder with erasures is equivalent to the 
original SC decoder and, hence, the maximum achievable rate is the 
capacity itself. The second curve corresponds to the family of binary 
symmetric channels (BSC). The third curve corresponds to the family 
of binary additive white Gaussian channels (BAWGN). The curve at 
the bottom corresponds to a universal lower bound on the achievable 
rate by the decoder with erasures. 

this can make the difference whether a coding scheme is 
implementable or not. 

A. Basic setting and definitions 

Let W : X — > y be a binary memoryless symmetric (BMS) 
channel, with input alphabet X = {0, 1}, output alphabet y, 
and the transition probabilities {W(y | x) : x € X, y G y}. 
Also, let I(W) denote the capacity of W. 

Let G2 = [1 ?]■ The generator matrix of polar codes is 
defined through the Kronecker powers of G2, denoted by 
Gn = Gf n . Throughout the paper, the variables N and n 
are related as N = 2™. Let us review very briefly how the 
generator matrix of polar codes is constructed. Consider the 
N x N matrix Gjv and let us label the rows of the matrix 
Gjv from top to bottom by 0, 1, • • • , N — 1. Now assume that 
we desire to transmit binary data over the channel W at rate 
R < I(W) with block-length N. One way to accomplish this 
is to choose a subset I C {0, • • • , N — 1} of size NR and to 
construct a vector U^ -1 = (/To, ■ • • , f/jv-i) in a way that it 
contains our NR bits of data at positions in I and contains, 
at positions not in I, some fixed value (for example 0) which 
is known to both the encoder and decoder. We then send the 
codeword X^ 1 = Uq^Gn through the channel W. We 
refer to the set T as the set of chosen indices or information 



indices and the set T c is called the set of frozen indices. We 
explain in Section III-AI how the good indices are chosen. At 
the decoder, the bits uq, ■ ■ ■ , Ujv-i are decoded one by one. 
That is, the bit Ui is decoded after Uq, ■ ■ ■ If i is a frozen 
index, its value is known to the decoder. If not, the decoder 
estimates the value of Ui by using the output y^ _1 and the 
estimates of uq, ■ ■ ■ , u<_i. 

B. Quantized SC decoder 

Let R* = M U an d consider a function Q(x) : 

R* — ► M* that is anti-symmetric (i.e., = — Q(— x)). 

We define the Q-quantized SC decoder as a version of the SC 
decoder in which the function Q is applied to the output of 
any computation that the SC decoder does. We denote such a 
decoder by SCDq. 

Typically, the purpose of the function Q is to model the 
case where we only have finite precision in our computations 
perhaps due to limited available memory or due to other 
hardware limitations. Hence, the computations are correct 
within a certain level of accuracy which the function Q 
models. Thus, let us assume that the range of Q is a finite 
set Q with cardinality \Q\. As a result, all the messages 
passed through the decoder SCDq belong to the set Q. 

In this paper we consider a simple choice of the function 
Q that is specified by two parameters: The distance between 
levels A, and truncation threshold M. Given a specific choice 
of M and A, we define Q as follows: 

r Li + 5J A > x€[-m,m], 

Q(x) = < (l) 

[ sign(x)M, otherwise. 
Note here that | Q | = 1 + 2M. 

C. Summary of results 

Theorem 1 (Mean Statement): Consider transmission over a 
BMS channel W of capacity I(W) using polar codes and 
a SCDq with message alphabet Q. Let C(W,Q) denote the 
maximum rate at which reliable transmission is possible for 
this setup. 

(i) Let |Q| = 3. Then there exists a computable decreas- 
ing sequence {U n } n ^ (see ( fT9] l) and a computable 
increasing sequence {L„}„ £ n (see (EOt). so that L„ < 

C(W, Q) < U n and 

lim L n = lim U„ . 

In other words, U n is an upper bound and L n is a lower 
bound on the maximum achievable rate C(W, Q) and for 
increasing n these two bounds converge to C(W,Q). 

(ii) To achieve an additive gap S > to capacity I(W), it 
suffices to choose log [ Q| = clog(l/<5). ■ 

Discussion: In Figure Q] the value of C(W,Q), \Q\ =3, is 
plotted as a function of I(W) for different channel families 
(for more details see Section Hl-D21 i. A universal lower bound 
for the maximum achievable rate is also given in Figure [T] This 
suggests that even for small values of \Q\ polar codes are very 
robust to quantization. In particular for channels with capacity 



close to 1, very little is lost by quantizing. The methods used 
here are extendable to other quantized decoders. 

The rest of the paper is devoted to proving the first part 
of Theorem Q] Due to space limitation, we have omitted the 
proof of the second part of theorem Q] as well as the proofs 
of the lemmas stated in the sequel and we refer the reader to 
iflOl for more details. 

II. General Framework for the Analysis 

A. Equivalent tree channel model and analysis of the proba- 
bility of error for the original SC decoder 

Since we are dealing with a linear code, a symmetric chan- 
nel and symmetric decoders throughout this paper, without loss 
of generality we confine ourselves to the all-zero codeword 
(i.e., we assume that all the Ui's are equal to 0). In order to 
better visualize the decoding process, the following definition 
is handy. 

Definition 2 (Tree Channels of Height n): For each i £ 
{0, 1, ■ • • , N — 1}, we introduce the notion of the i-th tree 
channel of height n which is denoted by T(i). Let bi . . . b n 
be the n-bit binary expansion of i. E.g., we have for n = 3, 

= 000, 1 = 001, 7 = 111. With a slight abuse of 
notation we use i and b\ ■ ■ ■ b n interchangeably. Note that for 
our purpose it is slightly more convenient to denote the least 
(most) significant bit as b n (&i). Each tree channel consists 
of n + 1 levels, namely 0, . . . , n. It is a complete binary tree. 
The root is at level n. At level j we have 2 n ~i nodes. For 

1 < j < n, if bj = then all nodes on level j are check 
nodes; if bj = 1 then all nodes on level j are variable nodes. 
Finally, we give a label for each node in the tree T(i): For 
each level j, we label the 2™~ J nodes at this level respectively 
from left to right by (j, 0), (j, 1), • • • , (j, 2"^ - 1). 

All nodes at level correspond to independent observations 
of the output of the channel W, assuming that the input is 0. 

An example for T(3) (that is n = 3, b = 011 and i = 3) is 
shown in Fig. [2] 
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Fig. 2. Tree representation of the tree-channel T(3). The 3-bit binary 
expansion of 3 is 6162^3 = 011 (note that bi is the most significant bit). 
The pair beside each node is the label assigned to it. 

Given the channel output vector 1 and assuming 
that the values of the bits prior to Ui are given, i.e., 
uq = 0, • • • = 0, we now compute the probabilities 

piy^-^u 1 ' 1 ]^ = 0) and p(y^~ 1 ^v}" 1 \u t = 1) via a 
simple message passing procedure on the equivalent tree 



channel T(i). We attach to each node in T(i) with label (j, k) 



a messagqj m-j k and we update the messages as we go up 
towards the root node. We start with initializing the messages 
at the leaf nodes of T(i). For this purpose, it is convenient 
to represent the channel in the log-likelihood domain; i.e., for 
the node with label (0, k) at the bottom of the tree which 



log-likelihood ratio (llr) log( 
TOo.fc. That is, 
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Next, the SC decoder recursively computes the messages 
(llr's) at each level via the following operations: If the nodes 
at level j are variable nodes (i.e., bj = 1), we have 



rrijM = m,j-i t 2k + mj-i,2k+x, 



(3) 



and if the nodes at level j are check nodes (i.e., bj = 0), the 
message that is passed up is 



rrij^k = 2 tanh 1 (tanh( 



rrij-i t 2k ■ 



tanh(^±I)). 



(4) 
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In this way, it can be shown that (|fl~)) the message that we 
obtain at the root node is precisely the value 

JV-l 



m„ j0 = log( 
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(5) 



Now, given {y^" 1 , Uq _1 ), the value of is estimated as 
follows. If m n o > we let Uj = 0. If m n < we let 
Uj = 1. Finally, if m ni o = we choose the value of m to 
be either or 1 with probability |. Thus, denoting as the 
event that we make an error on the i-th bit within the above 
setting, we obtain 

Pr(Ei) = Pr(m„, < 0) + ipr(m„, = 0). (6) 

Given the description of m„ o m terms of a tree channel, it 
is now clear that we can use density evolution J5] to compute 
the probability density function of m„ o- In this regard, at 
each level j, the random variables rrij t k are i.i.d. for k E 
{0, 1, • • • , 2 n ~i — 1}. The distribution of the leaf messages 
niQ k is the distribution of the variable log( j °j ), where 
Y ~ | 0). One can recursively compute the distribution 
of rrij t k m terms of the distribution of m,j-i t 2 k , rrij-i.2k+i and 
the type of the nodes at level j (variable or check) by using 
the relations (O, © with the fact that the random variables 
mj-i } 2k and m^-i^fe+i are i.i.d. 

B. Quantized density evolution 

Let us now analyze the density evolution procedure for 
the quantized decoder. For each label (j, k) in T(i), let rh^k 
represent the messages at this label. The messages rhj t k take 
their values in the discrete set Q (range of the function Q). It 
is now easy to see that for the decoder SCDq the messages 

'To simplify notation, we drop the dependency of the messages mj k to 
the position i whenever it is clear from the context. 



evolve via the following relations. At the leaf nodes of the 
the update equation for rhrj^) is 



tree we plug in the message rh ^ k = Q(log( ^^H°; )), and 



rhj,k = Q(rhj-i,2k + ihj-i,2k+i), 
if the node (j, k) is a variable node and 



(7) 



?Tij.fc = Q(2tanh 1 (tanh(- 



mj-i.2k . 



tanh(- 



1j-l,2k+l 



))), 



(8) 

if the node (j, k) is a check node. One can use the density 
evolution procedure to recursively obtain the densities of the 
messages rhj t k- 

Finally, let denote the event that we make an error in 
decoding the i-th bit, with a further assumption that we have 
correctly decoded the previous bits uq, ■ ■ ■ , Ui-i. In a similar 
way as in the analysis of the original SC decoder, we get 

Pr(E t ) = Pr(m„, < 0) + ^Pr{m nfi = 0). (9) 

Hence, one way to choose the information bits for the algo- 
rithm SCDq is to choose the bits Ui according to the least 
values of Pr(i^). 

An important point to note here is that with the decoder 
SCDq, the distribution of the messages in the trees T(i) is 
different than the corresponding ones that result from the 
original SC decoder. Hence, the choice of the information 
indices is also specified by the choice of the function Q as 
well as the channel W. 

Note here that, since all of the densities takes their value 
in the finite alphabet Q, the construction of such polar codes 
can be efficiently done in time 0( | Q | 2 N\ogN). We refer 
the reader to 0] for more details. 

C. Gallager Algorithm 

Since our aim is to show that polar codes under successive 
decoding are robust against quantization, let us investigate 
an extreme case. The perhaps simplest message-passing type 
decoder one can envision is the Gallager algorithm. It works 
with single-bit messages. Does this simple decoder have a non- 
zero threshold? Unfortunately it does not, and this is easy to 
see. We start with the equivalent tree-channel model. Consider 
an arbitrary tree-channel T(i). Since messages are only a 
single bit, the "state" of the decoder at level j of T(i) can 
be described by a single non-negative number, namely the 
probability that the message at level j is incorrect. It is an 
easy exercise to show that at a level with check nodes the 
state becomes worse and at a level with variable nodes the 
state stays unchanged and hence no progress in the decoding 
is achieved, irrespective of the given tree. In other words, this 
decoder has a threshold of zero. The problem is the processing 
at the variable nodes since no progress is achieved there. But 
since we only have two possible incoming messages there is 
not much degree of freedom in the processing rules. 



D. 1-Bit Decoder with Erasures 

Motivated by the previous example, let us now add one 
message to the alphabet of the Gallager decoder, i.e., we 
also add the possibility of having erasures. In this case Q(x) 
becomes the sign function^, i.e., 



Q(x) 




(10) 



As a result, all messages passed by the algorithm SCDq take 
on only three possible values: {— oo, 0, oo}. In this regard, the 
decoding procedure takes a very simple form. The algorithm 
starts by quantizing the channel output to one of the three 
values in the set Q = {— oo,0, oo}. At a check node we 
take the product of the signs of the incoming messages 
and at a variable node we have the natural addition rule 
(0 «— oo H — oo, <— + and oo ^— oo + oo, oo oo + 

and — oo <— — oo H oo, — oo <— — oo + ). Note that on 

the binary erasure channel, this algorithm is equivalent to the 
original SC decoder. 

Our objective is now to compute the maximum reliable rate 
that the decoder SCDq can achieve for a BMS channel W. 
We denote this quantity by C(W, Q). The analysis is done in 
three steps: 

1) The density evolution procedure: To analyze the perfor- 
mance of this algorithm, first note that since all our messages 
take their values in the set Q, then all the random variables 
that we consider have the following form 



D 



oo, 
0. 

-00, 



w.p. p, 
w.p. e, 
w.p. m. 



(11) 



Here, the numbers p,e,m are probability values and p + e + 
m = 1. Let us now see how the density evolves through the 
tree-channels. For this purpose, one should trace the output 
distribution of (0 and (HJ when the input messages are two 
i.i.d. copies of a r.v. D with pdf as in (fTTT i. 

Lemma 3: Given two i.i.d. versions of a r.v. D with distri- 
bution as in ( fTTI ). the output of a variable node operation (j7]), 
denoted by D + , has the following form 



oo, 
0, 



; <-2pe, 
2pm, 



w.p. p 
w.p. e 2 

- 2 1 2em 



(12) 



-oo, w.p. m 
Also, the check node operation (0, yields D~ as 



D~ = 



•oo, 
0. 



w.p. p 2 + m? , 



w.p. l-(l-e) 2 , 
w.p. 2pm. 



(13) 



In order to compute the distribution of the messages m n> o 
at a given level n, we use the method of |Q] and define 
the polarization process D n as follows. Consider the random 
variable L(Y) = log( ^[yj°] ), where Y ~ W(y\0). The 



stochastic process D n starts from the r.v. Do = Q(L(Y)) 
defined as 



D n = 



and for n > 



oo, w.p. p = Pr(L(F) > 0), 
0, w.p. e = Pr(L(F) = 0), 
-oo, w.p. m = Pr(L(F) < 0), 



D„ 



D+, 



W.p. o, 
W.p. 5, 



(14) 



(15) 



where the plus and minus operations are given in dT2b . ( [Oi l. 

2) Analysis of the process D n : Note that the output of 
process D n is itself a random variable of the form given in 
(fTTT i. Hence, we can equivalently represent the process D n 
with a triple (m n , e n ,p n ), where the coupled processes m n , e„ 
and p n are evolved using the relations ( fT2l and ([TBI and we 
always have m n + e„ + p n = 1, Following along the same 
lines as the analysis of the original SC decoder in (Q, we 
first claim that as n grows large, the process D n will become 
polarized, i.e., the output of the process D n will almost surely 
be a completely noiseless or a completely erasure channel. 

Lemma 4: The random sequence {D n = (p n , e n , m n ),n > 
0} converges almost surely to a random variable D^ such that 
L>oo takes its value in the set {(1,0,0), (0, 1,0)}. 
We now aim to compute the value of C(W, Q) = Pr(L> 00 = 
(1,0,0)), i.e., the highest rate that we can achieve with the 
1-Bit Decoder with Erasures. In this regard, a convenient 
approach is to find a function / : T> —> M such that 
/((0, 1, 0)) = and /(l, 0, 0) = 1 and for any D e V 

\{f{D+) + f{D-)) = f{D). 

With such a function /, the process {/(D ra )}n>o is a martin- 
gale and consequently we have Pr(Z?oo = (1,0,0)) = f(Do). 
Therefore, by computing the deterministic quantity /(-Do) we 
obtain the value of C(W, Q). However, finding a closed form 
for such a function seems to be a difficult taslfl Instead, the 
idea is to look for alternative functions, denoted by g : T> — > K, 
such that the process g(D n ) is a super-martingale (sub- 
martingale) and hence we can get a sequence of upper (lower) 
bounds on the value of Pr(£>oo = (1, 0, 0)) as follows. Assume 
we have a function g : T> — > R such that .g((0, 1, 0)) = and 
5(1, 0, 0) = 1 and for any DeD, 



-(g(D+)+g(D-))<g(D). 



(16) 



Then, the process {g(D n )} n >o is a super-martingale and for 
ft > we have 



PrOD^ = (1,0,0)) <E[g{D 71 



(17) 



The quantity E[g(Z) n )] decreases by ?? and by using Lemma|4] 
we have 



PrtDoo = (1,0,0)) = lim E[g{D n )}. 



(18) 



"Note here that we have further assumed that M = A and A 



0. 



3 The function / clearly exists as one trivial candidate for it is f(D) = 
Pr(Doo = (1,0,0)), where Doo is the limiting r.v. that the process 
{D n }„>o with starting value Do = D converges to. 



In a similar way, on can search for a function h : V — > K 
such that for h with the same properties as g except that the 
inequality ( TToT l holds in opposite direction and in a similar 
way this leads us to computable lower bounds on C(W, Q). It 
remain to find some suitable candidates for g and h. Let us first 
note that a density D as in (fTTT i can be equivalently represented 
as a simple BMS channel given in Fig. [3] This equivalence 




Fig. 3. The equivalent channel for the density D given in i ll It . 

stems from the fact that for such a channel, conditioned on 
the event that the symbol +1 has been sent, the distribution 
of the output is precisely D. With a slight abuse of notation, 
we also denote the corresponding BMS channel by D. In 
particular, it is an easy exercise to show that the capacity 
(1(D)), the Bhattacharyya parameter (Z(D)) and the error 
probability (E(D)) of the density D are given as 

P 



1(D)- 
Z(D) 



(m+p)(l-h 2 (- 



p- 
e,E(D) 



-))> 
iil 

1-p 



where h 2 (-) denotes the binary entropy function. Since 
the function Q is a not an injective function, we have 
< 1(D). This implies that the process /„ = 



I{D+)+I(D- 



I(D n ) is a bounded supermartingale. Furthermore, since 
I(D = (1,0,0)) = 1 and I(D = (0,1,0)) = 0, we deduce 
from Lemma |4] that /„ converges a.s. to a — 1 valued r.v. 
loo and hence 

C(W, Q) = Pv(D 00 = (1, 0, 0)) = Pr(/ 00 = 1) = E(/ 00 ). 
Now, from the fact that I n is a supermartingale, we obtain 

C(W,Q)<E[I n ]±U n , (19) 

for n € N. In a similar way, one can obtain a sequence of 
lower bounds for C(W, Q). 

Lemma 5: Define the function F(D) as F(D) = p—A^pm 
for D G V. We have F(D = (1,0,0)) = 1, F(D = 
(0, 1, 0)) = and F ^)+F(D-) > F(D y 
Hence, the process F n 
77, G N we have 



F(D n ) is a submartingale and for 



C(W,Q)>E[F n ]^L ri 



(20) 



Given a BMS channel W, one can numerically compute 
C(W, Q) with arbitrary accuracy using the sequences L n and 
U n (see Figure [TJ. Also, for a channel W with capacity I(W) 
and error probability E(W), we have 

1 - I(W) 



Therefore, inf C(D,Q) < C(W,Q), which 

{D:E(D)= 1 - I ^ V) } 

leads to the universal lower bound obtained in Figure Q] 

Example 6: Let the channel W be a BSC channel with 
cross over probability e = 0.11 (hence I(W) « 0.5). Using 
( l22b we obtain 

oo, w.p. 1 - e = 0.89, 
-oo, w.p. e = 0.11. K ' 

Therefore, we get L = F(D ) = -0.361 and U = I(D ) = 
0.5. We can also compute L l = F ( D ^+ F{D " } = -0.191, 
Ux = «°n + )+^n) = . 5 and 

U = F{D ° +) + F{D ° n + F{D ° +) + F{Dr) = -0.075, 



Dn = 



U 2 



I(D+ + ) + I(D+-) + I(D Q +) + I(D--) 



0.498. 



E(W) < 



(21) 



Continuing this way, one can find L 10 = 0.264, [7 10 = 0.474 
and L 20 = 0.398, U 20 = 0.465 and so on. 

3) Scaling behavior and error exponent: In the last step, 
we need to show that for rates below C(W, Q) the block-error 
probability decays to for large block-lengths. 

Lemma 7: Let D G T>. We have 

Z(D~) < 2Z(D) and Z(D+) < 2(Z(D))i. 

Hence, for transmission rate R < C(W,Q) and block-length 
N = 2", the probability of error of SCDq, denoted by 

P e ,Q(N,R) satisfies P e ^ Q (N,R) = o(2~ N? ) for (3 < 
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